5 research outputs found
PCA and K-Means decipher genome
In this paper, we aim to give a tutorial for undergraduate students studying
statistical methods and/or bioinformatics. The students will learn how data
visualization can help in genomic sequence analysis. Students start with a
fragment of genetic text of a bacterial genome and analyze its structure. By
means of principal component analysis they ``discover'' that the information in
the genome is encoded by non-overlapping triplets. Next, they learn how to find
gene positions. This exercise on PCA and K-Means clustering enables active
study of the basic bioinformatics notions. Appendix 1 contains program listings
that go along with this exercise. Appendix 2 includes 2D PCA plots of triplet
usage in moving frame for a series of bacterial genomes from GC-poor to GC-rich
ones. Animated 3D PCA plots are attached as separate gif files. Topology
(cluster structure) and geometry (mutual positions of clusters) of these plots
depends clearly on GC-content.Comment: 18 pages, with program listings for MatLab, PCA analysis of genomes
and additional animated 3D PCA plot
PCA Beyond The Concept of Manifolds: Principal Trees, Metro Maps, and Elastic Cubic Complexes
Multidimensional data distributions can have complex topologies and variable
local dimensions. To approximate complex data, we propose a new type of
low-dimensional ``principal object'': a principal cubic complex. This complex
is a generalization of linear and non-linear principal manifolds and includes
them as a particular case. To construct such an object, we combine a method of
topological grammars with the minimization of an elastic energy defined for its
embedment into multidimensional data space. The whole complex is presented as a
system of nodes and springs and as a product of one-dimensional continua
(represented by graphs), and the grammars describe how these continua transform
during the process of optimal complex construction. The simplest case of a
topological grammar (``add a node'', ``bisect an edge'') is equivalent to the
construction of ``principal trees'', an object useful in many practical
applications. We demonstrate how it can be applied to the analysis of bacterial
genomes and for visualization of cDNA microarray data using the ``metro map''
representation. The preprint is supplemented by animation: ``How the
topological grammar constructs branching principal components
(AnimatedBranchingPCA.gif)''.Comment: 19 pages, 8 figure
Elastic Maps and Nets for Approximating Principal Manifolds and Their Application to Microarray Data Visualization
Principal manifolds are defined as lines or surfaces passing through ``the
middle'' of data distribution. Linear principal manifolds (Principal Components
Analysis) are routinely used for dimension reduction, noise filtering and data
visualization. Recently, methods for constructing non-linear principal
manifolds were proposed, including our elastic maps approach which is based on
a physical analogy with elastic membranes. We have developed a general
geometric framework for constructing ``principal objects'' of various
dimensions and topologies with the simplest quadratic form of the smoothness
penalty which allows very effective parallel implementations. Our approach is
implemented in three programming languages (C++, Java and Delphi) with two
graphical user interfaces (VidaExpert
http://bioinfo.curie.fr/projects/vidaexpert and ViMiDa
http://bioinfo-out.curie.fr/projects/vimida applications). In this paper we
overview the method of elastic maps and present in detail one of its major
applications: the visualization of microarray data in bioinformatics. We show
that the method of elastic maps outperforms linear PCA in terms of data
approximation, representation of between-point distance structure, preservation
of local point neighborhood and representing point classes in low-dimensional
spaces.Comment: 35 pages 10 figure
Mathematical Modelling of Cell-Fate Decision in Response to Death Receptor Engagement
Cytokines such as TNF and FASL can trigger death or survival depending on cell lines and cellular conditions. The mechanistic details of how a cell chooses among these cell fates are still unclear. The understanding of these processes is important since they are altered in many diseases, including cancer and AIDS. Using a discrete modelling formalism, we present a mathematical model of cell fate decision recapitulating and integrating the most consistent facts extracted from the literature. This model provides a generic high-level view of the interplays between NFκB pro-survival pathway, RIP1-dependent necrosis, and the apoptosis pathway in response to death receptor-mediated signals. Wild type simulations demonstrate robust segregation of cellular responses to receptor engagement. Model simulations recapitulate documented phenotypes of protein knockdowns and enable the prediction of the effects of novel knockdowns. In silico experiments simulate the outcomes following ligand removal at different stages, and suggest experimental approaches to further validate and specialise the model for particular cell types. We also propose a reduced conceptual model implementing the logic of the decision process. This analysis gives specific predictions regarding cross-talks between the three pathways, as well as the transient role of RIP1 protein in necrosis, and confirms the phenotypes of novel perturbations. Our wild type and mutant simulations provide novel insights to restore apoptosis in defective cells. The model analysis expands our understanding of how cell fate decision is made. Moreover, our current model can be used to assess contradictory or controversial data from the literature. Ultimately, it constitutes a valuable reasoning tool to delineate novel experiments